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Abstract 

We describe the design and the implementation of PAGAI, a new static 
analyzer working over the LLVM compiler infrastructure, which computes 
inductive invariants on the numerical variables of the analyzed program. 

PAGAI implements various state-of-the-art algorithms combining ab- 
stract interpretation and decision procedures (SMT-solving) , focusing on 
distinction of paths inside the control flow graph while avoiding system- 
atic exponential enumerations. It is parametric in the abstract domain in 
use, the iteration algorithm, and the decision procedure. 

We compared the time and precision of various combinations of analy- 
sis algorithms and abstract domains, with extensive experiments both on 
personal benchmarks and widely available GNU programs. 



1 Introduction 

Sound static analysis automatically computes properties on programs, such as 
the possible values of their variables during execution. Applications include: 
showing that a program cannot encounter a runtime error (such as arithmetic 
overflow, division by zero, array access out of bounds), as in e.g. the Astree 
analyzer Q; computing invariants for use with assisted proof systems (such 
as the B method thereby lessening the burden on the user; computing 
invariants for advanced optimization techniques in compilation (e.g. showing 
that two array cells are distinct, in order to allow instruction reordering between 
assignments to these cells). All these applications need invariants on numerical 
quantities. 

This article introduces PAGAI, a new tool for fully automatic static analysis. 
PAGAI takes as input a program in the "bitcode" intermediate representation 



of LLVM [l3j, , a modern compilation framework. LLVM bitcode is a tar- 
get for several industrial-strengh compilers, most notably Clang (supporting C, 
C-I--I-, Objective-C and Objective-C-f-|-) and llvm-gcc (supporting, in addition 
to these, Fortran and Ada); furthermore, a growing number of analysis tools, 
testing tools, etc. are currently built around this platform (Calysto, KLEE, 
LAV, LLBMC). 
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The output of PAGAI is a list of inductive invariants for a selected subset 
of the control nodes of the original program: for structured source programs, 
PAGAI will provide an inductive invariant for loop headers 

At present, PAGAI checks user-specified safety properties provided through 
assertions using the standard C/C-I--I- assert (condition) macro. The tool will 
attempt proving that the assertion failure is unreachable and, if unsuccess- 
ful, provide a warning message (the tool does not at present include bounded 
model checking or path exploration techniques for reconstructing an actual 
failure trace, thus such a warning message should be interpreted as a possi- 
ble assertion failure). It also allows user-specified assumptions, through the 
assume(condition) macro. Executing traces falsifying assertions or assumptions 
are considered to terminate when executing the macro; thus, user-specified as- 
sertions may be used to guide the analyzer by providing invariants that it was 
not able to synthesize by itself. Possible extensions could include checking for 
memory safety of array accesses. 

PAGAI is based on abstract interpretation, a general framework for fully 
automatic static analysis. PAGAI infers invariants of a selected form; by default 
it performs linear relation analysis, which obtains invariants as conjunctions of 
linear inequalities (or, equivalently, convex polyhedra), but it also supports 
other abstract domains through a runtime option. Depending on the iteration 
algorithm selected, PAGAI may also infer invariants as disjunctions of elements 
of the abstract domain (e.g. unions of convex polyhedra). 

Textbook descriptions of abstract interpretation-based static analysis state 
that an inductive invariant is computed at every control point of the program. 
In contrast, PAGAI abstracts straight-line sequences of statements en bloc, com- 
puting invariants only at points where control flow branches or merges. Further- 
more, several algorithms implemented in PAGAI compute invariants only at the 
heads of loops (or, in general control-flow graph, at nodes forming a feedback ver- 
tex set, whose removal breaks all cycles in the graph) , expanding the rest of the 
control flow to a possibly exponential number of straight-line sequences of state- 
ments between the selected nodes. In order to avoid explicit enumerations of 
exponential sets, PAGAI uses decision procedures for arithmetic theories, based 
on the satisfiability modulo theory (SMT) approach: each pat h is enumerated 
only if needed, in response to a positive satisfiability query Il6j . 

The PAGAI tool is dedicated to experimenting with new analysis algorithms. 
It allows independent selection of the abstract domain and the iteration strat- 
egy, and partially independent selection of decision procedurell and thus is 
well-suited for comparisons. We thus conducted extensive experiments both on 
examples we produced ourselves (sometimes inspired by industrial code) and on 

^ A preliminary analysis pass selects a subset of nodes that cuts all cycles in the control- flow 
graph, by selecting all targets of return edges in a depth-first search traversal; when applied 
to a structured program, it selects loop headers. 

•^Certain abstract domains express relationships, such as linear congruences, that certain 
decision procedures cannot deal with. It is at present necessary that the decision procedure 
reflects semantics at least as precise as those of the abstract domain. This limitation will be 
lifted in the future. 
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GNU programs, for which the abihty to run on any C or C++ code, through 
the LLVM system, was especially useful. Front-ends for many analysis tools put 
restrictions (e.g. no backward goto instructions, no pointer arithmetic...), often 
satisfied by safety-critical embedded programs, but not by generic programs; 
our tool suffers no such restrictions, though it may in some cases apply coarse 
abstractions which may possibly yield weak invariants. 

After illustrating the limitation of traditional abstract interpretation on an 
example in section [21 we will describe PAGAFs implementation in section [31 
and comment on the results of extensive experiments in section [31 allowing the 
comparison of state of the art techniques on real-life programs. 

2 Motivating Example 

In most forward abstract interpretation-based analyses, when control flows from 
several nodes into a single node, the abstract value at that node is obtained by 
computing the least upper bound of the incoming abstract values in the abstract 
domain (in backward analysis, this occurs when control flows from a single node 
to several nodes). If the abstract domain is convex polyhedra, then this means 
computing the convex hull of the incoming polyhedra. Such an operation may 
induce unrecoverable loss of precision by introducing spurious states that cannot 
occur in concrete program runs. 

An example of program where such a loss of precision occurs is depicted in 
Fig. [H In this program, the loop body has two feasible paths that are executed 
alternatively, depending on a variable "phase" . Such programs, with active code 
paths depending on global "mode" or "phase" variables, often occur in reactive 
systems. 

Removing program point hq breaks all cycles; we are thus primarily con- 
cerned with obtaining an inductive invariant at that point. We consider the 
domain of convex polyhedra and thus wish to obtain this invariant as a polyhe- 
dron. Because convex polyhedra form a lattice of infinite height, we use Kleene 
iterations (pushing abstract values through control-flow edges) with a widen- 
ing scheme, which ensures convergence in finite time to an inductive invariant, 
followed by decreasing (narrowing) iterations. 

At program point ns, classical forward abstract interpretation with convex 
polyhedra computes the convex hull of three incoming polyhedra over variables 
(phase, X, t). This convex hull introduces extra states, unreachable in the con- 
crete programs, for the analysis of the fragment from to ng . When analyzing 
the whole loop, these extra states prevent proving x < 100. 

To cope with this problem, a solution is to compute disjunctive invariants 
at all intermediate nodes: at ^5, keep an explicit list of three polyhedra, and 
thus obtain a list of nine polyhedra at ng. We pass the convex hull of these 
polyhedra to the widening operator at point no (which operates on polyhedra, 
not on lists of polyhedra). The drawback is that the number of elements in the 
lists may grow exponentially with the number of successive tests. 

A second solution, equivalent to the preceding with respect to final results 
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but different in its operation, is to distinguish all nine paths inside the loop (from 
no to no), compute the final outcome of each path, and compute the convex hull 
of these outcomes. Instead of enumerating all nine paths explicitly, we consider 
them in succession, only as needed. We start with an empty polyhedron at ng 
(more generally, it should contain initial states at this control point) , and process 
paths as long as they make this polyhedron grow. The next path to consider is 
obtained from a model of an arithmetic formula expressing this growth condition 
[l6j : if this formula is unsatisfiable, this means there is no such path and thus 
the convex hull encompasses the outcome of all paths. 

The advantages of this second method over the preceding one are twofold: 
there is no exponentially large list of abstract elements, and the satisfiability 
query for the formula is handed over to a satisfiability modulo theory (SMT) 
solver. Modern SMT-solvers are very efficient; their caching mechanisms may, 
for instance, remember that taking a certain branch in the code is incompatible 
with taking another one (if a Boolean is associated with passing through each 
branch, then this is just a blocking clause inside the underlying SAT-solver). 
The algorithms implemented in PAGAI are variants of this approach of implic- 
itly representing of exponentially-sized sets of paths and enumerating them as 
needed. 




Figure 1: Example of program, where the loop behaviour vary depending on a 
variable phase. 
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Name 


kLOC 


\Pr\ 


a2ps 


55 


2012 


gawk 


59 


902 


gnuchess 


38 


1222 


gnugo 


83 


2801 


grep 


35 


820 


gzip 


27 


494 


lapack/blas 


954 


16422 


make 


34 


993 


tar 


73 


1712 



Table 1: List of analyzed open-source projects, with their respective number of 
lines of code, and their number of control points in 



3 Implementation 

PAGAI is a prototype interprocedural static analyzer, that implements our re- 
cent combined techniques 11| as well as the classical abstract interpretation 
algorithm, and the state-of-the-art techniques Path Focusing jlil l and Guided 
Static Analysis (lOt - 



Abstract domains are provided by the APRON library [12|, and include 
convex polyhedra (from the builtin Polka "PK" library), octagons, and products 
of intervals. It also has an interface with the Parma Polyhedra Library 31, giving 
access to more abstract domains (e.g. a reduced product of polyhedra and linear 
congruences, producing invariants such as < x < 1001 A x = (mod 7)). 

For SMT-solving, our analyzer uses Yices @~or Microsoft Z3 through 
their C APL An implementation of communications with the SMT-solver by 
textual messages sent through a pipe following the SMT-Lib 2 standard [1| is 
underway, and now partially supports Z3, MathSAT 5 and SMTinterpolU 



3.1 Analysis algorithm 

For each program, we distinguish a set Pw of suitable widening points by a sim- 
ple algorithm: initialize Pw = and for each procedure, compute the strongly 
connected components of its control-flow graph using Tarjan's algorithm; the 
targets of the back-edges of the depth- first search are added to Pw- The re- 
sulting cut set or feedback vertex set is not necessarily minimal, but is sufficient 
to disconnect all cycles — more sophisticated techniques are discussed in e.g. 
Bourdoncle It is however unclear whether more advanced selection tech- 

•^It is unfortunately impossible to ignore differences between solvers behind the supposedly 
standard interface, since different solvers may support slightly different sets of operators and 
settings and may return models in different formats. 

^ It would be possible to obtain a feedback vertex set minimal with respect to inclusion 
by successive removal of nodes. Obtaining one of minimal cardinality is an NP-complete 
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niques would finally yield stronger invariants; the current simple scheme has the 
advantage that, when run over a control- flow graph obtained from a structured 
program, it marks heads of loops, which is a "natural" choice. 



While [11|,|16| provide for another set Pr 3 P^y, with abstract join operators 
(as opposed to widenings) being applied at points in \ Pw, our tool does 
not currently such technique, which is meant to reduce the complexity of SMT 
formulas at the expense of analysis precision. 

LLVM bitcode is in static single assignment (SSA) form: a given scalar 
variable is given a value at a single syntactic point in the program. In concrete 
terms, an assignment x—2*x+l; gets translated into a definition X2 = 2xi + 1, 
with distinct variables xi and X2 corresponding to the same original variable 
X at different points in the program. Because LLVM generally assigns rather 
straightforward names (e.g. x.O for the first renaming of variable x), the user 
can map the invariants back to the original source code; an automatic and 
more robust back-to-source mapping, based on debugging information, is being 
developed. 

LLVM makes it easy to follow definition-use and use-definition chains: for a 
given variable (say, X2) one can immediately obtain its definition (say, 2xi + 1). 
One may see conversion to SSA form as a static precomputation of some of 
the symbolic propagations proposed by Mine [l5| to enhance the precision of 
analyses. 

SSA introduces (/)-functions at the head of a control code to define variables 
whose value depends on which incoming edge was last taken to reach this con- 
trol node. For instance, for if (...) { x = 2*x-i-l; } else { x= 0; }, then X2 is 
defined as (j){2xi + 1, 0). 

In this framework, each variable is uniquely defined as an arithmetic (-I-, 
— , X, /) function of other variables that themselves may not be representable 
as arithmetic functions, because they are defined using 0-functions, loads from 
memory, return values from function calls, or other numerical operations (e.g. 
bitwise operators) that arc not representable with our class of basic arithmetic 
operations. We may vary the class of arithmetic operations, for instance, by 
restricting ourselves to linear ones. 

This motivates a key implementation decision of our tool: only those vari- 
ables vi, . . . ,Vn that are not defined by arithmetic operations are retained as 
coordinates in the abstract domain (e.g. as dimensions in polyhedra), assuming 
they are live at the associated control point. 

For instance, assume that x, y, z are numerical variables of a program, x is 
defined as x = y + z, and x,y,z are live at point p. Instead of having x as 
a dimension for the abstract value at point p, we only have y and z. All the 
properties for x can be directly extracted from the abstract value attached to 
p and the relation x = y + z. This is an optimisation in the sense that there is 
redundant information in the abstract value if both x, y and z are dimensions 
of Xp. The classical definition of liveness can be adapted to our case: 



problem, but Shamir !l8| showed that it can be done in Unear time for a class of graphs 
including reducible graphs, that is, those obtained from structured programs. This latter 
algorithm is being implemented. 
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Definition 1 (Liveness by linearity). A variable v is live by linearity at a 
control point p if and only if one of these conditions holds: (i) v is live in p. 
(a) There is a variable v' , defined as a linear combination of other variables 
Vi, V2, • ■ • , Vn, SO that 3i G {1, . . . , n}, v = Vi, and v' is live by linearity in p. 

Finally, a variable is a dimension in the abstract domain if and only if it 
is live by linearity and it is not defined as a linear combination of program 
variables. 

A basic block of code therefore amounts to a parallel assignment oper- 
ation between live-by-linearity variables (wi,...,u„) i— . . . , w„), , . . . , 
/„(ui, . . . , u„)); such operations are directly supported by APRON. This has 
three benefits: (i) it limits the number of dimensions in the abstract values, 
since polyhedra libraries typically perform worse with higher dimensions (ii) 
the abstract operation for a single path in path-focusing methods also is a (large) 



parallel assignment; (iii) as suggested by Mine [15|, this approach is more precise 
than running abstract operations for each program line separately: for instance, 
for y=x; z=x— y; with precondition x G [0,1], a line- by- line interval analysis 
obtains y G [0,1] and z G [—1,1] while our "en bloc" analysis symbolically 
simplifies z = x — x — and thus z G [0, 0]. 

In the event that a node is reachable only by a single control-fiow edge 
(which may occur because of dead code, or during the first phases of guided 
static analysis), the (p operation reduces to a copy of the values flowing from 
that edge. In this case, our tool just propagates symbolic values through the 
predecessor node, without introducing </>- variables. 



3.2 Use 

PAGAI takes as input an LLVM bitcode file, and outputs an inductive invariant 
for each control point in (typically, the widening points). When a program 
contains an assert (...) function call, PAGAI also outputs whether the state- 
ment has been proved. It is also possible to add some preconditions about the 
variables, etc, using a function assume (...). Both assert and assume are imple- 
mented as C macros, assert (x) is roughly defined as if (! x) __assert_fail (); , 
and the tool just tests for the reachability of __assert_fail (); : if it is un- 
reachable, then the assertion is true, assume works with the same principle, 
and is defined as if (! x) __assumption_declared(). Both __assert_fail and 
__assumption_declared are noreturn functions, terminating the program imme- 
diately. 



3.3 Current limitations of the tool, possible future works 

Our tool currently only operates over scalar variables from the SSA represen- 
tation and thus cannot directly cope with arrays or memory accessed through 

^The additional dimensions express linear equalities between variables, which are directly 
handled by polyhedra library. They should therefore cost little assuming some sparse repre- 
sentation of the constraints. Alas, several libraries, including APRON, compute with dense 
vectors and matrices, which means that any increase in dimensions slows computations. 
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pointers. We therefore run it after the "memory to registers" (mem2reg) opti- 
mization phase in LLVM, which lifts most memory accesses to scalar variables. 
The remaining memory reads are treated as nondeterministic choices, and writes 
are ignored. This is a sound abstraction^ 

The analysis is currently intraprocedural: ftmction calls are ignored in a 
sound way (the return value is a nondeterministic choice, the value of all vari- 
ables escaping from the local scope is discarded...). In order to increase precision, 
we apply function inlining as an LLVM optimization phase. Plans for interpro- 
cedural analysis include computing input /output summaries for functions as 
elements of the abstract domain (e.g. if the function operates over variables x 
and y, then one could compute a polyhedron over (x, y, x' , y') encompassing all 
input-output pairs) or as more general formulas. 

Since it is often advantageous to distinguish whether a loop has been exe- 
cuted at least once we unroll every loop once, again with a LLVM optimization 
phase. 

Our tool currently assumes that integer variables are unbounded mathe- 
matical integers (Z) and floating-point variables are real (or rational) numbers. 
Techniques for sound analysis of bounded integers, including with wraparound, 
and of floating-point operations have been developed in e.g. the Astree system 
0, J but porting these techniques to our iteration schemes using SMT-solving 
requires supplemental work. It is unclear whether one should use bitvector 
arithmetic inside the SMT formula, or use mathematical integers with explicit 
splits for wraparoundU 

Our implementation of path-focusing currently does not use true acceleration 
techniques, as proposed by Monniaux et Gonnord [l6| . Instead, it simply runs 
widening and narrowing iterations on a single path. 

We currently analyze each strongly connected component of the control-flow 
graph in topological order; thus each loop nest gets analyzed as a single fixed 
point. An alternative method would be to recursively decompose the strongly 
connected components (for structured programs, this amounts to reconstructing 
the nested loop structure) and summarize the inner loops before analyzing the 
outer loop. 

The analysis is currently only forward, even though nothing in the techniques 
implemented is specific to forward analysis. A possible extension would therefore 
be backward analysis from the __assert_fail () statements. 

^As rightly pointed out by a referee, this is a sound abstraction only if memory safety is 
assumed. The mem2reg preprocessing phase also assumes memory safety, as well as, possibly, 
the absence of other undefined behaviors as defined by the C standard. This is the price 
of using the front-end from a generic compiler: C compilers have the right to assume that 
undefined behaviors do not occur, including in preprocessing and optimization phases. 

^Consider the very simple loop for(int i=0; i<n; i++) . The obvious loop invariant is < 
i < n, but it is valid only if n > 0. One would thus need to use disjunctive loop invariants to 
obtain 0<i<nV(i = OAn<0). It is much simpler to unroll the loop once. 

*E.g. an operation z = x + y over ra-bit signed integers would appear as the disjunction 
of three statements z = x + y ^ -2""^ <x + y< 2"-^, z = x + y + 2'^/\-x + y< -2"'-l, 
z = X + y — 2" h X + y > 2^~^ : one "normal" control path and two "overflow" paths. 
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Benchmark 






G/S 








PF/S 






PF/G 






G + 


PF/PF 


G4 


-PF/G 




DIS/G + PF 






C 


D 






C 


D 






C D 






C 


D unc. 


C 


D u 




C 3 






a2ps-4.14 





28 










4.82 


2.55 


2 


27 


4.54 2.55 


2 


27 


6.81 


0.28 


8.23 







13.06 3.40 





56 


gawk~4.0.0 


4 


62 










3.70 


20.37 





92 


0.92 22.22 







22.22 





11.11 


2.77 




16.66 2.77 





92 


giiuchess-6.0.0 


1 


51 


3.47 







6.50 


4.33 







6.72 3.25 





21 


6.72 


2.38 


10.19 


2.38 




15.18 2.81 


3 


03 


giiugo-3.8 





51 


4.44 





34 


11.45 


4.27 


3 


07 


12.13 4.27 


2 


73 


10.25 


3.07 2.05 


17.77 


3.76 


34 


9.05 11.28 


4 


78 


grep~2.9 







6.19 





47 


1.90 


4.76 





47 


3.80 1.90 


1 


90 


7.61 


2.38 


8.57 


2.38 




10.47 5.23 





47 


gzip-1.4 





58 


7.01 


1 


75 


1.75 


12.86 


1 


16 


3.50 8.18 


1 


16 


15.78 


2.92 1.16 


17.54 


1.75 




17.54 15.78 


1 


16 


lapack-3.3. 1 


2 


60 


5.77 





40 


3.11 


5.06 


1 


03 


4.66 3.47 


1 


62 


7.55 


1.06 


9.24 


1.06 


81 


16.11 7.09 


1 


34 


make- 3. 82 


2 


61 


0.52 







1.82 


6.26 


1 


82 


1.56 8.09 


1 


82 


11.74 


0.52 


6.52 


2.34 1 


56 


12.27 4.43 





78 


tar- 1.2 6 


4 


53 


3.27 







5.28 


2.77 







2.77 2.01 





75 


7.05 


0.50 


7.05 


0.25 




9.82 7.05 


1 


51 



Table 2: Results of the comparison of the various techniques described in this 
paper: classic Abstract Interpretation (S), Guided Static Analysis (G), Path- 
focusing (PF), our combined technique (G-I-PF), and its version using disjunc- 
tive invariants (DIS). For instance, G/S compares the benefits of Guided Static 
Analysis over the classic Abstract interpretation algorithm. C (resp. D) gives 
the percentage of invariants stronger (more precise; smaller with respect to in- 
clusion) with the left-side (resp. right-side) technique, and "uncomparable" 
gives the percentage of invariants that are uncomparable, i.e neither greater nor 
smaller; the code points where both invariants are equal make up the remaining 
percentage 

4 Experiments 

We conducted extensive experiments on real-life programs in order to compare 
the different techniques, mostly on open-source projects (Tab. [T]) written in C, 
C-I--I- and Fortran. 

4.1 Precision of the various techniques 

For each program and each pair {Ti,T2) of analysis techniques, we list the 
proportion of control points in Pr where Ti (resp. T2) gives a strictly stronger 
invariant, denoted by C (resp. D), and the proportion of control points where 
the invariants given by Ti and T2 are uncomparable for the inclusion ordering 
(the remainder of the control points are thus those for which both techniques 
give the same invariant). We use convex polyhedra as the abstract domain. 

Let us briefly comment the results given in more details in Table [2j Guided 
Static Analysis from Gopan et Reps [10| improves the result of the classical 
Abstract Interpretation in 2.21% of the control points in Pr. Path-focusing 
from Monniaux et Gonnord [l^ finds better invariants in 4.13% of the control 
points. 

However, these two techniques also lose precision in an important number 
(4.64% for G, 5.14% for PF) of control points, and obtain worse results than the 
classical many times. This result is unexpected, and could be partially explained 
by bad behaviour of the widening operator. 

Finally, our combined technique gives the most promising results, since it is 
statistically more precise than the other techniques. It improves the precision 
of the inductive invariant in 8.29% to 9.86% of the control points compared to 
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Benchmark 


s 


G 


PF 


G+PF 


DIS 


a2ps-4.14 


23 


74 


34 


115 


162 


gawk-4.0.0 


15 


46 


12 


40 


50 


gnuchess-6.0.0 


50 


220 


81 


312 


351 


gnugo-3.8 


77 


159 


92 


766 


1493 


grep-2.9 


41 


85 


22 


65 


122 


gzip-1.4 


22 


268 


91 


303 


230 


lapack-3.3.1 


294 


3740 


3773 


8159 


10351 


make- 3. 8 2 


67 


108 


53 


109 


257 


tar- 1.26 


37 


218 


115 


253 


396 



Table 3: Execution time for each technique, expressed in seconds 



the three previous techniques. Still, we obtain worse result in a non-negligible 
number of cases (2.02%). 

The analysis using disjunctive invariants greatly improves the precision of 
the analysis (for 14.46% of the control points in compared to G+PF), at the 
expense of a higher time cost (see Table[3]). It also gives worse results in 6.85% 
of the points, most probably because of a non-optimal choice of the cr function, 



detailed in jll|. 



While experimenting with techniques that use SMT-solving, we encountered 
some limitations due to non-linear arithmetic in the analyzed programs. Indeed, 
the SMT-solver is not able to decide the satisfiability of some SMT-formulae ex- 
pressing the semantics of non-linear programs. In this case, we skipped the func- 
tions for which the SMT-solver returned the "unknown" result. This limitation 
occurred very rarely in our experiments, except for the analysis of Lapack/Blas, 
where 798 over the 1602 functions have been skipped. Lapack/Blas implements 
matrix computations, which use floating-point multiplications. In cases where 
the formula is expressed in too rich a logic for the SMT-solver to deal with, a 
number of workarounds are possible, including: (i) Linearization, as per Mine 



15[, which overapproximates nonlinear semantics by linear semantics, (ii) Re- 
placing the results of nonlinear operations by "unknown" . Neither is currently 
implemented in our tool. 

Table |3] gives the execution time of the different analysis techniques. It 
is interesting to see that Path-focusing is sometimes faster than the classical 
algorithm. This seems due to the fact that this algorithm computes inductive 
invariant on a small number of control points compared to classical approaches, 
thus leading to fewer operations over abstract values. 

4.2 Precision of Abstract Domains 

For each program and each pair (_Di,Z?2) of abstract domains, we compare by 
inclusion the invariants of the different control points in Pr = Pw (Tab.|3]). 



10 



Benchmark 


PK/OCT 


PK/BOX 




OCT/BOX 


PK/PKEQ 


PK/PKGRID 


POLY/POLY* 




C D unc. 


C D u 




C D 




C D 




C 


D 






C 


D unc. 


a2ps-4.14 


12.74 .78 


21.64 2 


13 


18.94 


.93 


90.47 








. 72 




36 


.77 





gawk-4.0.0 


21.34 


26.96 


17.97 





88.76 








4.44 













gnuchGss-6.0.0 


5.99 5.78 2.47 


12.67 3.68 2 


24 


14.87 





83.43 








2.23 







.20 


3.47 


gnugo-3.S 


18.75 2.08 2.08 


22.50 1.66 1 


11 


10.86 


1.12 


71.27 .21 


1.29 





.47 










3.69 


grop-2.9 


3.30 


8.26 


8.26 





61.74 








.44 













gzip-1.4 


21.16 2.18 


32.84 .72 1 


45 


26.27 





80.29 



















8.75 


lapack-3.3. 1 


11. S4 5.67 .85 


78.96 2.16 2 


99 


85.03 





94.46 


.09 


.09 


3.22 




47 





4.25 


make- 3. 82 


6.50 4.00 5.50 


6.52 4.34 5 


97 


11.94 





46.50 








2.29 










2.98 5.47 


tar-1.26 


5.17 4.20 


9.70 3.23 


97 


9.38 





62.13 








3.31 










4.91 



Table 4: Results of the comparison of the various abstract domains, when using 
the same technique (G+PF). We used as abstract domains Convex Polyhedra 
(PK and POLY), Octagons (OCT), intervals (BOX), linear equalities (PKEQ) 
and the reduced product of NewPolka convex polyhedra with linear congruences 
from the Parma Polyhedra Library d. (PKGRID). The last column compares 
the domain of Convex Polyhedra with the improved widening operator from 
Bagnara et al. (POLY*), and Convex Polyhedra using the classical widening 
operator (POLY). POLY and POLY* use the PPlQ. C, D and "unc." are 
defined as in Tab. [21 



Statistically, the domain of convex polyhedra gives the better results, but 
commonly yields weaker invariants than the domains of octagons/intervals; this 
is a known weakness of its widening operator 17[ • The Octagon domain appears 
to be much better than intervals; this is unsurprising since in most programs 
and libraries, bounds on loop indices are non constant: they depend on some 
parameters (array sizes etc.). 

The Lapack/Blas benchmarks are unusual compared to the other programs. 
These libraries perform matrix computations, using nested loops over indices; 
such programs are the prime target for polyhedral loop optimization techniques 
and it is therefore unsurprising that polyhedra and octagons perform very well 
over them. 

The analysis of linear equalities (PKEQ) performs very fast compared to 
other abstract domains, but yields very imprecise invariants: it only detects 
relations of the form aiXi — C where and C are constants. 

Using the reduced product of convex prolyhedra with linear congruences 
(PKGRID) improves the analysis by 2.52%. 

Finally, we evaluated the benefits of the improved version of the widening 
operator for convex polyhedra from Bagnara et al. Q , compared to the classical 
widening. We found that the improved version from Bagnara et al. [1] yields 
more precise invariants for 3.70% of the control points in Pr. 



4.3 Future Work 

It is not totally relevant to compare by inclusion the abstract values obtained 
by the various analysis techniques. Indeed, a slightly smaller invariant may 
not always be useful to prove the desired properties. Future work should thus 
include experiments with better comparison metrics, such as (i) the number of 
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assert that have been proved in the code. Unfortunately, it is difficult to find 
good benchmarks or real life programs with many assert statements; (ii) the 
number of false alarms in a client analysis that detects array bound violations, 
arithmetic overflows, etc. 
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